Week 4: Overfitting/MCMC

Overfitting

A central tension in our modeling is the one between explanation – good causal models – and prediction. In McElreath’s lecture, he leads us to the intuition that predictive models are generally those that do a terrible job of representing the causal model. So the tools covered in this lecture should be considered tools for prediction, but not for identifying causal models.

When trying to maximize prediction, we need to be wary of overfitting – when the model learns too much from the sample. Methods for avoiding overfitting favor simpler models. We’ll make use of regularizing, which helps stop the model from becoming too excited about any one data point. We’ll also discuss scoring devices, like information criteria and cross-validation.

Code
sppnames <- c( "afarensis","africanus","habilis","boisei","rudolfensis","ergaster","sapiens")
brainvolcc <- c( 438 , 452 , 612, 521, 752, 871, 1350 )
masskg <- c( 37.0 , 35.5 , 34.5 , 41.5 , 55.5 , 61.0 , 53.5 )
d <- data.frame( species=sppnames , brain=brainvolcc , mass=masskg )
base = d %>% 
  ggplot(aes(x=masskg, y=brainvolcc)) +
  geom_point() +
  geom_text(aes(label=sppnames), hjust=0, nudge_x = 1) +
  labs(x="body mass (kg)", y="brain volume (cc)")
p1 = base + geom_smooth(method='lm', se =F) +ggtitle("Simple linear model")
p2 = base + geom_smooth(method='lm', se =F, formula=y~poly(x, 6)) +ggtitle("6th degree polynomial")
(p1 | p2)
Code
d$mass_std <- (d$mass - mean(d$mass))/sd(d$mass)
d$brain_std <- d$brain / max(d$brain)

m7.1 <- quap(
  alist(
    brain_std ~ dnorm( mu , exp(log_sigma) ),
    mu <- a + b*mass_std,
    a ~ dnorm( 0.5 , 1 ),
    b ~ dnorm( 0 , 10 ),
    log_sigma ~ dnorm( 0 , 1 )
  ), data=d )

m7.6 <- quap(
  alist(
  brain_std ~ dnorm( mu , 0.001 ),
  mu <- a + b[1]*mass_std + b[2]*mass_std^2 +
    b[3]*mass_std^3 + b[4]*mass_std^4 +
    b[5]*mass_std^5 + b[6]*mass_std^6,
  a ~ dnorm( 0.5 , 1 ),
  b ~ dnorm( 0 , 10 )
 ), data=d , start=list(b=rep(0,6)) )

par(mfrow=c(1,2))
brain_loo_plot(m7.1)
brain_loo_plot(m7.6)

the path to model performance criteria

  1. establish a measurement scale for the distance from perfect accuracy
    • need to discuss information theory
  2. establish deviance as an approximation of relative distance from accuracy
  3. establish that we only care about out-of-sample deviance

First: establishing a measurement scale. The two major dimensions to consider are:

  • cost-benefit analysis

    • how much does it cost when we are wrong?

    • how much do we win when we’re right?

  • accuracy in context

    • judging accuracy in a way that accounts for how much a model could possibly improve prediction

clash of the weatherpeople

Day Current Weatherman New Weatherman Outcome
1 1.0 0.0 rain
2 1.0 0.0 rain
3 1.0 0.0 rain
4 0.6 0.0 sun
5 0.6 0.0 sun
6 0.6 0.0 sun
7 0.6 0.0 sun
8 0.6 0.0 sun
9 0.6 0.0 sun
10 0.6 0.0 sun

If accuracy is the chance of a correct prediction:

\(\text{Current} = [(3 \times 1) + (.4 \times 7)]/10 = .58\)

\(\text{New} = [(3 \times 0) + (1 \times 7) ]/10= .70\)